[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6 by rkazants · Pull Request #1689 · huggingface/optimum-intel

rkazants · 2026-04-15T19:44:39Z

What does this PR do?

Re-created PR #1634

Fixes 181271, 181280, 182003

Installation instructions:

pip install -U git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python

Exporting cmd-line:

optimum-cli export openvino -m Qwen/Qwen3.5-0.8B Qwen3.5-0.8B

Inference script:

from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM

model_dir = "Qwen/Qwen3.5-0.8B"

processor = AutoProcessor.from_pretrained(model_dir)
model = OVModelForVisualCausalLM.from_pretrained(model_dir)

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Before submitting

[N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
[] Did you write any new necessary tests?

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

…pport_qwen3_5

…en3_5

…pport_qwen3_5

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

droans · 2026-05-04T20:06:56Z

Thanks for adding support!

I've been attempting to test this locally. However, there is an issue with the models generated and/or the OpenVino implementation for Qwen.

I've exported copies of qwen3.5-9b along with the 27B and 35B-A3B versions of both Qwen3.5 and 3.6. These were all exported using the command optimum-cli export openvino -m qwen/qwen3.5-XXX --weight-format int4 /models/qwen3.5-xxxx-int4.

The 9B appears to works fine (outside of enabling/disabling thinking but that's a different issue). The other two, though, are causing major issues.

First, neither of them will load to GPU. When I attempt to load qwen3.6-27b, I receive this error:

Failed to initialize VLMPipeline: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
Traceback (most recent call last):
  File "/app/src/engine/ov_genai/vlm.py", line 278, in load_model
    self.model_path = VLMPipeline(
                      ^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST

When I attempt to do the same with qwen3.6-35b-a3b, I receive an error that originates from the same call but is slightly different:

Failed to initialize VLMPipeline: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -58 CL_INVALID_EVENT
Traceback (most recent call last):
  File "/app/src/engine/ov_genai/vlm.py", line 278, in load_model
    self.model_path = VLMPipeline(
                      ^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -58 CL_INVALID_EVENT

Secondly, the models are generating gibberish. I'm able to get them to load to my CPU but the response doesn't make any sense:

Prompt: Tell me a joke
Response: <>D*!<*M@0IM#6/8=IQH4&Q.I"JG170O)JP$CLL8QH4Q&HB;2@>C&PF&;5Q=K$2%+.'?1!C!R!IPN(%E!MN(7G8B0B3+E'FA!)1.OM@,P;0+@3,0>-5QQ-@C1;B0A'$KP)BJ?7@JB788LQ:8/!%2?O%#8'E#9;(A65NG+(..L=:&N7".7"=A?B'0D*#=KI><O3J)?CB.=D2#8M-F.I>6.@R6P(%%8(;#45IB6->A68(>20:&PR0IM!!Q?QADB##I!FML';#7.E><.O0801MR6C)7M,6=4&D%;7NEDQ,*CNLO3:!3'*L45.'5OG,()%M2/L)8?*R?EP2F31&9/1K3?3HQRLEP"!D>OH3:@/.(R,"=!B&L>F28A<I++RK+P.2%98R(-7//M9A6(8)!3<O*GFBQ%B&!6D,Q+%BN='ORF61NI0>1J:@?LA=MH6KBP%(9+HPR26-A+P:QOD+CM2)0Q+=K>4LF;B:='3DDIQ.*K+C39'E2PO$FF7N+7F<2>/8+##-B;G?<"?BJH,#D>KG:9>8HL!1+8%RLP9PDH9@DL9#**K!?8ELCH,,QF9?@=)5'MJ#K5:'R,75L.8E8HN5$E1I7A5$R"'1/P2??&AC6G−7BB4−MMF<JL)27=− 
′
 >/QA9=(−3)G>?=6) 
′
 K%"/DH9GQ:6A9.5P30QOHM<:C&C(0=&'1ICPCK'N.<7%N67%42LI7NM(Q(>1DFO@2-$3?HAPR%:>P<%7P@.LJCN,47C7@@IPG"M%-2FOAP;E%EI?&K5&"(7!P7L7%/17RR!JO&?:M0G<-OR>-')?9;,!-8),/5A3J@490,)GAH4#G-(BI-@B":*M9<M#JD25%)8J)"I/2R%E"4-,I8(N6#',--R;9P6RE(I/N2;6@=')))1F(RH3:%#A6JER2:<P+@<L:PEFO(C8',D%DG!#BC%5(')!;I86G8?<8=C>-6+5.<8==%85O-6:DA=)6&OF@11"8L3/N-1%6;:E:FI(&@PD6+BE@B<L=<C#*O@D#?Q)O&<")7:5COMEN@?=8+(;7.#(&B2PI?R0%!8>J8N?B(%4(:01OHN7<%,:I(47QRJ5&@666!1'5,<L@:)"<<'"(J#8/N;O;3AI-P.AN3>6>%RN>3CB1%@G)(N-LOM/:GQNK2M6KBD&&A1,?)>Q0)?883B1#(+G"!AQ5#@R6:M)"%>1M8>'E70E0>J05IO,,E%2"@B4=8.6P;(=17C$GO-'M=&L$3PM;1;>>"FP*/">,6IHI+C$(K06L?779!>I-73,6/=QQ;Q%:88M7O;9=%5RI/B'E*DI@IIOM)!BG@JN0B21&A;J5$5#1DO.2I;&!7DB@OP6;(L6O?&"5P&H),KE?:$2)@RA7&7=R?;O=CG:GR)3/<N$6%6C1KDMM&AEGHMFI"3O

I can submit submit an issue, provide more information, and/or move this discussion to openvinotoolkit/openvino.genai if necessary.

rkazants · 2026-05-05T04:19:09Z

I can submit submit an issue, provide more information, and/or move this discussion to openvinotoolkit/openvino.genai if necessary.

Hi @droans,

Thanks for reporting this. Regarding CPU issue, you probably use the latest OpenVINO nightly build where we have a regression. We anticipate this PR merged: openvinotoolkit/openvino#35640

Regarding GPU, it is a problem on the GPU side.

Can you please create GitHub issue and provide reproducers using optimum-intel API: https://github.com/huggingface/optimum-intel/issues?

Best regards,
Roman

…pport_qwen3_5

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

## Description This PR enables Qwen3.5 model in VLM pipeline (only SDPA use case), updates tests and documentation. Requires huggingface/optimum-intel#1689 for model export. Current WWB accuracy results: ``` Optimum vs HF INFO:whowhatbench.wwb:Metrics for model: models/qwen3_5_0_8b_fp16 INFO:whowhatbench.wwb: similarity 0 0.990854 ``` ``` GenAI vs Optimum (default vision preprocessing) INFO:whowhatbench.wwb:Metrics for model: models/qwen3_5_0_8b_fp16 INFO:whowhatbench.wwb: similarity 0 0.939989 ``` ``` GenAI vs Optimum (VISION_PREPROCESS=CPP) INFO:whowhatbench.wwb:Metrics for model: models/qwen3_5_0_8b_fp16 INFO:whowhatbench.wwb: similarity 0 0.959576 ``` CVS-181273 ## Checklist: - [x] This PR follows [GenAI Contributing guidelines](https://github.com/openvinotoolkit/openvino.genai?tab=contributing-ov-file#contributing). - [x] Tests have been updated or added to cover the new code. - [x] This PR fully addresses the ticket. - [x] I have made corresponding changes to the documentation. --------- Co-authored-by: Copilot <copilot@github.com>

echarlaix added 30 commits January 19, 2026 10:39

Transformers v5

53d19b9

fix loading for llava_next_video

5205434

Remove deprecated transformers.onnx

e8feb0c

Merge branch 'main' into transformers-v5

55e4b3d

remove deprecated transformers.onnx from tests

bb54f64

remove huggingface_hub deprecated

71aa34e

relative to absolute import

0954015

update workflow to v5

1ba9789

remove redundant

f158656

update loading given transformers version

9345143

remove deprecated AutoModelForVision2Seq

b290ae3

update workflow

a4d1dc0

style

ac953ba

update setup

8001884

deprecated is_offline_mode

5f2a007

remove incompatible neural-compressor installation

ad477fe

remove documentation reference

42e98b8

add install transformers step

4ee3f51

Merge branch 'main' into transformers-v5

64c2022

transformers v5

8204264

install diffusers from source for v5

b319d19

remove deprecated CLIPFeatureExtractor

42300e4

openvino 2025.3.0

2a76102

add ov cache classes

f38703a

merge main in branch

46144d1

openvino nightly for modeling tests

2d3c734

openvino 2025.3 for modeling tests

b6dcefd

stop moving misplaced parameters from config to generation_config

ea24727

fix transformers version for doc building

07ff06b

fix transformers version for doc building

1270db0

Apply suggestion from @rkazants

e55dbe3

rkazants commented May 4, 2026

View reviewed changes

Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated

Apply suggestion from @rkazants

9ac9d79

rkazants commented May 4, 2026

View reviewed changes

Comment thread .github/workflows/test_openvino_preview_models.yml Outdated

rkazants commented May 4, 2026

View reviewed changes

Comment thread .github/workflows/test_openvino_preview_models.yml Outdated

Apply suggestions from code review

6b05ccc

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>

popovaan reviewed May 4, 2026

View reviewed changes

Comment thread optimum/exporters/openvino/model_patcher.py

rkazants added 2 commits May 4, 2026 16:46

Fix export of tiny moe model

6105d71

Apply code formatting

d747639

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants commented May 4, 2026

View reviewed changes

Comment thread tests/openvino/test_export.py Outdated

rkazants added 8 commits May 4, 2026 20:53

Apply suggestion from @rkazants

d1468cf

Revert changes non needed in tests

b898493

Merge remote-tracking branch 'openvino-agent/support_qwen3_5' into su…

c99ac3d

…pport_qwen3_5

Merge remote-tracking branch 'origin/support_qwen3_5' into support_qw…

725b345

…en3_5

Apply suggestion from @rkazants

dda5164

Fix compression tests

166d8bd

Merge remote-tracking branch 'openvino-agent/support_qwen3_5' into su…

80316f9

…pport_qwen3_5

Apply code-formatting

38e7451

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants requested a review from popovaan May 4, 2026 18:39

rkazants added 5 commits May 5, 2026 04:51

Optimize MoE sub-graph

a79e5e0

Merge remote-tracking branch 'openvino-agent/support_qwen3_5' into su…

61d992f

…pport_qwen3_5

Skip persimmon

3de6ad3

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Apply code-formatting

df63712

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Remove unused var

ff1fb6d

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label May 5, 2026

rkazants merged commit 8ec3275 into huggingface:main May 5, 2026
62 of 80 checks passed

droans mentioned this pull request May 5, 2026

[Bug] Unable to load Qwen3.5 Models to GPU #1720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689
rkazants merged 229 commits into
huggingface:mainfrom
rkazants:support_qwen3_5

rkazants commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

droans commented May 4, 2026 •

edited

Loading

Uh oh!

rkazants commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

rkazants commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

droans commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkazants commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

rkazants commented Apr 15, 2026 •

edited

Loading

droans commented May 4, 2026 •

edited

Loading

rkazants commented May 5, 2026 •

edited

Loading